AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multilingual Caption Generation

# Multilingual Caption Generation

Paligemma2 3b Pt 896
PaliGemma 2 is a multimodal vision-language model that combines image and text inputs to generate text outputs. It supports multiple languages and is suitable for various vision-language tasks.
Image-to-Text Transformers
P
google
2,536
22
Paligemma 3b Ft Cococap 224
PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports multi-language input and output and is suitable for various vision-language tasks.
Image-to-Text Transformers
P
google
209
1
Paligemma 3b Pt 896
PaliGemma is a versatile lightweight vision-language model (VLM) that supports image and text inputs and generates text outputs. It has multilingual capabilities.
Image-to-Text Transformers
P
google
1,788
119
Paligemma 3b Ft Science Qa 224
PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports image and text input and generates text output, suitable for various vision-language tasks.
Text-to-Image Transformers
P
google
113
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase